Similarity-Based Data reduction Techniques

نویسندگان

  • Gongde Guo
  • Hui Wang
  • David A. Bell
چکیده

The k-nearest neighbours (kNN) is a simple but effective method for classification. Its major drawbacks are (1) low efficiency, and (2) dependency on the selection of a “good value” for k. In this paper, we propose a novel similarity-based data reduction method (SBModel) together with three variants aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the amount of data needed for classification, thus making classification faster. Experiments conducted on some public data sets show that SBModel and its variants compare well with C5.0, kNN, wkNN, and other data reduction methods in both efficiency and effectiveness. ACM Classification: I.2.4 (Computing Methodologies – Artificial Intelligence – Knowledge Representation Formalisms And Methods); I.2.6 (Computing Methodologies – Artificial Intelligence – Learning); I.5.2 (Computing Methodologies – Pattern Recognition – Design Methodology)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Geometric View of Similarity Measures in Data Mining

The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...

متن کامل

A novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes

The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

Merging Similarity and Trust Based Social Networks to Enhance the Accuracy of Trust-Aware Recommender Systems

In recent years, collaborative filtering (CF) methods are important and widely accepted techniques are available for recommender systems. One of these techniques is user based that produces useful recommendations based on the similarity by the ratings of likeminded users. However, these systems suffer from several inherent shortcomings such as data sparsity and cold start problems. With the dev...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Embedded Map Projection for Dimensionality Reduction-Based Similarity Search

We describe a dimensionality reduction method based on data point projection in an output space obtained by embedding the Growing Hierarchical Self Organizing Maps (GHSOM) computed from a training data-set. The dimensionality reduction is used in a similarity search framework whose aim is to efficiently retrieve similar objects on the basis of the Euclidean distance among high dimensional featu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Research and Practice in Information Technology

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2005